Thesaurus-Based Index Term Extraction for Agricultural Documents
نویسندگان
چکیده
This paper describes a new algorithm for automatically extracting index terms from documents relating to the domain of agriculture. The domain-specific Agrovoc thesaurus developed by the FAO is used both as a controlled vocabulary and as a knowledge base for semantic matching. The automatically assigned terms are evaluated against a manually indexed 200-item sample of the FAO’s document repository, and the performance of the new algorithm is compared with a state-of-the-art system for keyphrase extraction.
منابع مشابه
ارائه روشی برای استخراج کلمات کلیدی و وزندهی کلمات برای بهبود طبقهبندی متون فارسی
Due to ever-increasing information expansion and existing huge amount of unstructured documents, usage of keywords plays a very important role in information retrieval. Because of a manually-extraction of keywords faces various challenges, their automated extraction seems inevitable. In this research, it has been tried to use a thesaurus, (a structured word-net) to automatically extract them. A...
متن کاملContext based Indexing in Search Engines using Ontology
Indexing in search engines has been an active area of current researches. The main aim of search engines is to provide most relevant documents to the users in minimum possible time. So granting efficient and fast accesses to the index is a major issue for performances of Web Search Engines. Indexing is performed on the web pages after they have been gathered into a repository by the crawler. Th...
متن کاملAutomatic Term Identification by User Profile for Document Categorisation in Medline
We show how term extraction methods such as AMTEX and MMTX can be used for the automatic categorisation of medical documents by user profile (novice users and experts). This is achieved by mapping document terms to external lexical resources such as WordNet, and MeSH (the medical thesaurus of NLM).
متن کاملIAALD AFITA WCCA2008 WORLD CONFERENCE ON AGRICULTURAL INFORMATION AND IT Thesaurus and Ontology Technology for the Improvement of Agricultural Information Retrieval
We have been in a web information stage, by new information management technologies, we can get better agricultural development. The paper introduces the research work on agricultural thesaurus and ontology; it could improve the agricultural information retrieval. Main work include to convert Chinese Agricultural Thesaurus (CAT) to the agricultural ontology, this can use traditional domain know...
متن کاملMéthodologie de transformation d'un thesaurus en une ontologie de domaine
Information Retrieval techniques make use of terms that are automatically extracted from documents; these terms are used to give information access. In this paper we propose an approach to enrich semantically this extraction by adding knowledge from thesaurus. More specifically, the methodology we promote in this paper aims at transforming a thesaurus into a domain ontology which will then be u...
متن کامل